Search CORE

6 research outputs found

Data generator for evaluating ETL process quality

Author: Abelló Gamazo Alberto
Jovanovic Petar
Nakuçi Emona
Theodorou Vasileios
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Detection of ETL bottlenecks by using process mining

Author: Nakuçi Emona
Publication venue: Universitat Politècnica de Catalunya
Publication date
Field of study

Increasing need for application benchmarking and testing purposes requires large amounts of data. However, obtaining realistic data from the industry for testing purposes, is often impossible due to confidentiality issues and expensive data transfer over the network i.e., Internet. Hence, there is a gap between the need to benchmark and the lack of a common testing environment to achieve it. The scope of this thesis is to contribute in narrowing the above presented gap, by introducing a theoretical framework of data generation for the simulation of data processes. Therefore, we aim at generating input data and hence, providing a common testing environment for testing and evaluating data processes. Specifically, we focus on generating data for ETL data processes by analyzing the semantics of the ow. The motivation comes from the fact that ETL processes are often time-consuming and error prone. Therefore, it is of high importance to evaluate and benchmark them, in order to identify bottlenecks and constantly improve their performance. Moreover, we introduce a layered architecture design for developing a prototype of the ETL data generation framework. In addition, we present a pilot tool developed for implementing the ETL data generation framework following the proposed architecture and the ETL semantics principle. As a conclusion to our work, we introduce the data generation approach and moreover show its feasibility to generate workload scenarios useful for testing and benchmarking ETL processes

RECERCAT

Detection of ETL bottlenecks by using process mining

Author: Nakuçi Emona
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Data generator for evaluating ETL process quality

Author: Abelló Gamazo Alberto
Jovanovic Petar
Nakuçi Emona
Theodorou Vasileios
Publication venue: 'Elsevier BV'
Publication date
Field of study

RECERCAT

Bijoux : data generator for evaluating ETL process quality

Author: Abelló Gamazo Alberto
Jovanovic Petar
Nakuçi Emona
Theodorou Vasileios
Publication venue
Publication date
Field of study

Obtaining the right set of data for evaluating the fulfillment of different quality standards in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. Additionally, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over data, and automatically generates testing datasets. At the same time, it considers different dataset and transformation characteristics (e.g., size, distribution, selectivity, etc.) in order to cover a variety of test scenarios. We report our experimental findings showing the effectiveness and scalability of our approach.Peer Reviewe

RECERCAT

Papers presented at the workshop on plasma edge theory in fusion devices, Augustusburg, G.D.R., 26th-30th April 1988

Author: Abelló Gamazo Alberto
Jovanovic Petar
Nakuçi Emona
Theodorou Vasileios
Publication venue
Publication date: 01/01/1988
Field of study

Contributions by JET authorsSIGLEAvailable from British Library Document Supply Centre- DSC:4672.262(JET-P--(88)24) / BLDSC - British Library Document Supply CentreGBUnited Kingdo

Crossref

UPCommons. Portal del coneixement obert de la UPC

OpenGrey Repository